A GPU Oriented Data Partitioning Method to Overlap Communication and Computation
نویسندگان
چکیده
منابع مشابه
Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a GPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060 GPU card, we implement a CPU–GPU parallel double-precision general matrix–matrix multiplication (dgemm) operation, and achieve a per...
متن کاملTiling on systems with communication/computation overlap
In the framework of fully permutable loops, tiling is a compiler technique (also known as ‘loop blocking’) that has been extensively studied as a source-to-source program transformation. Little work has been devoted to the mapping and scheduling of the tiles on to physical parallel processors. We present several new results in the context of limited computational resources and assuming communic...
متن کاملOptimizing Metacomputing with Communication-Computation Overlap
In the framework of distributed object systems, this paper presents the concepts and an implementation of an overlapping mechanism between communication and computation. This mechanism allows to decrease the execution time of a remote method invocation with parameters of large size. Its implementation and related experiments in the C++// language running on top of Globus and Nexus are described.
متن کاملOverlap of Computation and Communication on Shared-Memory
This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) shared-memory multiprocessors. The transformation overlaps the communication time resulting form nonlocal memory accesses with the computation time in parallel loops to e ectively hide the latency of the remote accesses. The transformation peels from a ...
متن کاملAutomatic Computation and Data Partitioning on Scalable
Scalable shared memory multiprocessors are becoming increasingly popular platforms for high-performance scienti c computing because they both scale to large numbers of processors and support the familiar shared memory abstraction. In order to improve application performance on these machines, it is essential to divide computation among processors and to place data carefully in the distributed s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Energy Procedia
سال: 2011
ISSN: 1876-6102
DOI: 10.1016/j.egypro.2011.12.523